knowledge snippet
IMACT-CXR - An Interactive Multi-Agent Conversational Tutoring System for Chest X-Ray Interpretation
Le, Tuan-Anh, Vu, Anh Mai, Yang, David, Awasthi, Akash, Van Nguyen, Hien
IMACT-CXR is an interactive multi-agent conversational tutor that helps trainees interpret chest X-rays by unifying spatial annotation, gaze analysis, knowledge retrieval, and image-grounded reasoning in a single AutoGen-based workflow. The tutor simultaneously ingests learner bounding boxes, gaze samples, and free-text observations. Specialized agents evaluate localization quality, generate Socratic coaching, retrieve PubMed evidence, suggest similar cases from REFLACX, and trigger NV-Reason-CXR-3B for vision-language reasoning when mastery remains low or the learner explicitly asks. Bayesian Knowledge Tracing (BKT) maintains skill-specific mastery estimates that drive both knowledge reinforcement and case similarity retrieval. A lung-lobe segmentation module derived from a TensorFlow U-Net enables anatomically aware gaze feedback, and safety prompts prevent premature disclosure of ground-truth labels. We describe the system architecture, implementation highlights, and integration with the REFLACX dataset for real DICOM cases. IMACT-CXR demonstrates responsive tutoring flows with bounded latency, precise control over answer leakage, and extensibility toward live residency deployment. Preliminary evaluation shows improved localization and diagnostic reasoning compared to baselines.
- North America > United States > Texas > Harris County > Houston (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
KidneyTalk-open: No-code Deployment of a Private Large Language Model with Medical Documentation-Enhanced Knowledge Database for Kidney Disease
Long, Yongchao, Yang, Chao, Tang, Gongzheng, Wang, Jinwei, Sui, Zhun, Zhou, Yuxi, Hong, Shenda, Zhang, Luxia
Privacy-preserving medical decision support for kidney disease requires localized deployment of large language models (LLMs) while maintaining clinical reasoning capabilities. Current solutions face three challenges: 1) Cloud-based LLMs pose data security risks; 2) Local model deployment demands technical expertise; 3) General LLMs lack mechanisms to integrate medical knowledge. Retrieval-augmented systems also struggle with medical document processing and clinical usability. We developed KidneyTalk-open, a desktop system integrating three technical components: 1) No-code deployment of state-of-the-art (SOTA) open-source LLMs (such as DeepSeek-r1, Qwen2.5) via local inference engine; 2) Medical document processing pipeline combining context-aware chunking and intelligent filtering; 3) Adaptive Retrieval and Augmentation Pipeline (AddRep) employing agents collaboration for improving the recall rate of medical documents. A graphical interface was designed to enable clinicians to manage medical documents and conduct AI-powered consultations without technical expertise. Experimental validation on 1,455 challenging nephrology exam questions demonstrates AddRep's effectiveness: achieving 29.1% accuracy (+8.1% over baseline) with intelligent knowledge integration, while maintaining robustness through 4.9% rejection rate to suppress hallucinations. Comparative case studies with the mainstream products (AnythingLLM, Chatbox, GPT4ALL) demonstrate KidneyTalk-open's superior performance in real clinical query. KidneyTalk-open represents the first no-code medical LLM system enabling secure documentation-enhanced medical Q&A on desktop. Its designs establishes a new framework for privacy-sensitive clinical AI applications. The system significantly lowers technical barriers while improving evidence traceability, enabling more medical staff or patients to use SOTA open-source LLMs conveniently.
- Health & Medicine > Therapeutic Area > Nephrology (1.00)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering?
Shi, Yucheng, Yang, Tianze, Chen, Canyu, Li, Quanzheng, Liu, Tianming, Li, Xiang, Liu, Ninghao
Large Language Models (LLMs) have shown remarkable capabilities in general domains but often struggle with tasks requiring specialized knowledge. Conventional Retrieval-Augmented Generation (RAG) techniques typically retrieve external information from static knowledge bases, which can be outdated or incomplete, missing fine-grained clinical details essential for accurate medical question answering. In this work, we propose SearchRAG, a novel framework that overcomes these limitations by leveraging real-time search engines. Our method employs synthetic query generation to convert complex medical questions into search-engine-friendly queries and utilizes uncertainty-based knowledge selection to filter and incorporate the most relevant and informative medical knowledge into the LLM's input. Experimental results demonstrate that our method significantly improves response accuracy in medical question answering tasks, particularly for complex questions requiring detailed and up-to-date knowledge.
- North America > United States > Massachusetts (0.04)
- North America > United States > Illinois (0.04)
- North America > United States > California > Santa Clara County > Cupertino (0.04)
- Asia > Middle East > Jordan (0.04)
What External Knowledge is Preferred by LLMs? Characterizing and Exploring Chain of Evidence in Imperfect Context
Chang, Zhiyuan, Li, Mingyang, Jia, Xiaojun, Wang, Junjie, Huang, Yuekai, Wang, Qing, Huang, Yihao, Liu, Yang
Incorporating external knowledge into large language models (LLMs) has emerged as a promising approach to mitigate outdated knowledge and hallucination in LLMs. However, external knowledge is often imperfect. In addition to useful knowledge, external knowledge is rich in irrelevant or misinformation in the context that can impair the reliability of LLM responses. This paper focuses on LLMs' preferred external knowledge in imperfect contexts when handling multi-hop QA. Inspired by criminal procedural law's Chain of Evidence (CoE), we characterize that knowledge preferred by LLMs should maintain both relevance to the question and mutual support among knowledge pieces. Accordingly, we propose an automated CoE discrimination approach and explore LLMs' preferences from their effectiveness, faithfulness and robustness, as well as CoE's usability in a naive Retrieval-Augmented Generation (RAG) case. The evaluation on five LLMs reveals that CoE enhances LLMs through more accurate generation, stronger answer faithfulness, better robustness against knowledge conflict, and improved performance in a popular RAG case.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- Media (0.52)
- Leisure & Entertainment > Sports (0.46)
RoRA-VLM: Robust Retrieval-Augmented Vision Language Models
Qi, Jingyuan, Xu, Zhiyang, Shao, Rulin, Chen, Yang, Di, Jin, Cheng, Yu, Wang, Qifan, Huang, Lifu
Current vision-language models (VLMs) still exhibit inferior performance on knowledge-intensive tasks, primarily due to the challenge of accurately encoding all the associations between visual objects and scenes to their corresponding entities and background knowledge. While retrieval augmentation methods offer an efficient way to integrate external knowledge, extending them to vision-language domain presents unique challenges in (1) precisely retrieving relevant information from external sources due to the inherent discrepancy within the multimodal queries, and (2) being resilient to the irrelevant, extraneous and noisy information contained in the retrieved multimodal knowledge snippets. In this work, we introduce RORA-VLM, a novel and robust retrieval augmentation framework specifically tailored for VLMs, with two key innovations: (1) a 2-stage retrieval process with image-anchored textual-query expansion to synergistically combine the visual and textual information in the query and retrieve the most relevant multimodal knowledge snippets; and (2) a robust retrieval augmentation method that strengthens the resilience of VLMs against irrelevant information in the retrieved multimodal knowledge by injecting adversarial noises into the retrieval-augmented training process, and filters out extraneous visual information, such as unrelated entities presented in images, via a query-oriented visual token refinement strategy. We conduct extensive experiments to validate the effectiveness and robustness of our proposed methods on three widely adopted benchmark datasets. Our results demonstrate that with a minimal amount of training instance, RORA-VLM enables the base model to achieve significant performance improvement and constantly outperform state-of-the-art retrieval-augmented VLMs on all benchmarks while also exhibiting a novel zero-shot domain transfer capability.
- North America > United States > New York > New York County > New York City (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (10 more...)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.55)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.34)
A Multi-Source Heterogeneous Knowledge Injected Prompt Learning Method for Legal Charge Prediction
Sun, Jingyun, Wei, Chi, Li, Yang
Legal charge prediction, an essential task in legal AI, seeks to assign accurate charge labels to case descriptions, attracting significant recent interest. Existing methods primarily employ diverse neural network structures for modeling case descriptions directly, failing to effectively leverage multi-source external knowledge. We propose a prompt learning framework-based method that simultaneously leverages multi-source heterogeneous external knowledge from a legal knowledge base, a conversational LLM, and related legal articles. Specifically, we match knowledge snippets in case descriptions via the legal knowledge base and encapsulate them into the input through a hard prompt template. Additionally, we retrieve legal articles related to a given case description through contrastive learning, and then obtain factual elements within the case description through a conversational LLM. We fuse the embedding vectors of soft prompt tokens with the encoding vector of factual elements to achieve knowledge-enhanced model forward inference. Experimental results show that our method achieved state-of-the-art results on CAIL-2018, the largest legal charge prediction dataset, and our method has lower data dependency. Case studies also demonstrate our method's strong interpretability.
- Asia > China > Heilongjiang Province > Harbin (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Law (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.87)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Adaptive Retrieval-Augmented Generation for Conversational Systems
Wang, Xi, Sen, Procheta, Li, Ruizhe, Yilmaz, Emine
Despite the success of integrating large language models into the development of conversational systems, many studies have shown the effectiveness of retrieving and augmenting external knowledge for informative responses. Hence, many existing studies commonly assume the always need for Retrieval Augmented Generation (RAG) in a conversational system without explicit control. This raises a research question about such a necessity. In this study, we propose to investigate the need for each turn of system response to be augmented with external knowledge. In particular, by leveraging human judgements on the binary choice of adaptive augmentation, we develop RAGate, a gating model, which models conversation context and relevant inputs to predict if a conversational system requires RAG for improved responses. We conduct extensive experiments on devising and applying RAGate to conversational models and well-rounded analyses of different conversational scenarios. Our experimental results and analysis indicate the effective application of RAGate in RAG-based conversational systems in identifying system responses for appropriate RAG with high-quality responses and a high generation confidence. This study also identifies the correlation between the generation's confidence level and the relevance of the augmented knowledge.
"What do others think?": Task-Oriented Conversational Modeling with Subjective Knowledge
Zhao, Chao, Gella, Spandana, Kim, Seokhwan, Jin, Di, Hazarika, Devamanyu, Papangelis, Alexandros, Hedayatnia, Behnam, Namazifar, Mahdi, Liu, Yang, Hakkani-Tur, Dilek
Task-oriented Dialogue (TOD) Systems aim to build dialogue systems that assist users in accomplishing specific goals, such as booking a hotel or a restaurant. Traditional TODs rely on domain-specific APIs/DBs or external factual knowledge to generate responses, which cannot accommodate subjective user requests (e.g., "Is the WIFI reliable?" or "Does the restaurant have a good atmosphere?"). To address this issue, we propose a novel task of subjective-knowledge-based TOD (SK-TOD). We also propose the first corresponding dataset, which contains subjective knowledge-seeking dialogue contexts and manually annotated responses grounded in subjective knowledge sources. When evaluated with existing TOD approaches, we find that this task poses new challenges such as aggregating diverse opinions from multiple knowledge snippets. We hope this task and dataset can promote further research on TOD and subjective content understanding. The code and the dataset are available at https://github.com/alexa/dstc11-track5.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China > Hong Kong (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (12 more...)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
PICK: Polished & Informed Candidate Scoring for Knowledge-Grounded Dialogue Systems
Wilie, Bryan, Xu, Yan, Chung, Willy, Cahyawijaya, Samuel, Lovenia, Holy, Fung, Pascale
Grounding dialogue response generation on external knowledge is proposed to produce informative and engaging responses. However, current knowledge-grounded dialogue (KGD) systems often fail to align the generated responses with human-preferred qualities due to several issues like hallucination and the lack of coherence. Upon analyzing multiple language model generations, we observe the presence of alternative generated responses within a single decoding process. These alternative responses are more faithful and exhibit a comparable or higher level of relevance to prior conversational turns compared to the optimal responses prioritized by the decoding processes. To address these challenges and driven by these observations, we propose Polished \& Informed Candidate Scoring (PICK), a generation re-scoring framework that empowers models to generate faithful and relevant responses without requiring additional labeled data or model tuning. Through comprehensive automatic and human evaluations, we demonstrate the effectiveness of PICK in generating responses that are more faithful while keeping them relevant to the dialogue history. Furthermore, PICK consistently improves the system's performance with both oracle and retrieved knowledge in all decoding strategies. We provide the detailed implementation in https://github.com/bryanwilie/pick .
- Asia > China > Hong Kong (0.05)
- Europe > Belgium (0.04)
- North America > United States > New York (0.04)
- Europe > Monaco (0.04)
- Media > Music (0.68)
- Leisure & Entertainment > Sports (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
- Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Generative Knowledge Selection for Knowledge-Grounded Dialogues
Sun, Weiwei, Ren, Pengjie, Ren, Zhaochun
Knowledge selection is the key in knowledge-grounded dialogues (KGD), which aims to select an appropriate knowledge snippet to be used in the utterance based on dialogue history. Previous studies mainly employ the classification approach to classify each candidate snippet as "relevant" or "irrelevant" independently. However, such approaches neglect the interactions between snippets, leading to difficulties in inferring the meaning of snippets. Moreover, they lack modeling of the discourse structure of dialogue-knowledge interactions. We propose a simple yet effective generative approach for knowledge selection, called GenKS. GenKS learns to select snippets by generating their identifiers with a sequence-to-sequence model. GenKS therefore captures intra-knowledge interaction inherently through attention mechanisms. Meanwhile, we devise a hyperlink mechanism to model the dialogue-knowledge interactions explicitly. We conduct experiments on three benchmark datasets, and verify GenKS achieves the best results on both knowledge selection and response generation.
- North America > United States > Arkansas (0.04)
- Europe > Czechia (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Asia > China > Shandong Province > Qingdao (0.04)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)